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Abstract 

Pseudo-marginal Markov chain Monte Carlo methods for sampling from intractable distributions 
have gained recent interest and have been theoretically studied in considerable depth. Their main 
appeal is that they are exact, in the sense that they target marginally the correct invariant distri¬ 
bution. However, the pseudo-marginal Markov chain can exhibit poor mixing and slow convergence 
towards its target. As an alternative, a subtly different Markov chain can be simulated, where bet¬ 
ter mixing is possible but the exactness property is sacrificed. This is the noisy algorithm, initially 
conceptualised as Monte Carlo within Metropolis (MCWM), which has also been studied but to a 
lesser extent. The present article provides a further characterisation of the noisy algorithm, with a 
focus on fundamental stability properties like positive recurrence and geometric ergodicity. Sufficient 
conditions for inheriting geometric ergodicity from a standard Metropolis-Hastings chain are given, 
as well as convergence of the invariant distribution towards the true target distribution. 

Keywords: Markov chain Monte Carlo; Pseudo-marginal Monte Carlo; Monte Carlo within 
Metropolis; Intractable likelihoods; Geometric ergodicity. 


1 Introduction 

1.1 Intractable target densities and the pseudo-marginal algorithm 

Suppose our aim is to simulate from an intractable probability distribution 7r for some random variable 
X, which takes values in a measurable space (X,B(X)). In addition, let 7r have a density ir(x) with 
respect to some reference measure /i(c&r), e.g. the counting or the Lebesgue measure. By intractable 
we mean that an analytical expression for the density tt(x) is not available and so implementation of a 
Markov chain Monte Carlo (MCMC) method targeting n is not straightforward. 

One possible solution to this problem is to target a different distribution on the extended space 
(X x W, B(X) x BiyV)), which admits 7r as marginal distribution. The pseudo-marginal algorithm (Beau¬ 
mont 2003, Andrieu and Roberts 2009) falls into this category since it is a Metropolis-Hastings (MH) 
algorithm targeting a distribution n N , associated to the random vector (X, W) defined on the product 
space (X x W,B(X) x B(W)) where W C := [0, oo). It is given by 

Tfpf(dx, dw) := n(dx)Q Xt N(dw)w, (1) 

where {Q x ,n}( x n)gXxN+ a f am ily of probability distributions on (W, £>(W)) satisfying for each (a;, N) £ 
X xN 

[W x ,n] = 1 , for W X , N ~ Q x , n {-). (2) 

Throughout this article, we restrict our attention to the case where for each x £ X, W Xj jv is Q x ,n- a.s. 
strictly positive, for reasons that will become clear. 

The random variables {W Xt w} n are commonly referred as the weights. Formalising this algorithm 
using (1) and (2) was introduced by Andrieu and Vihola (2015), and “exactness” follows immediately: 7f 
admits 7r as a marginal. Given a proposal kernel q : X x B{X) —> [0,1], the respective proposal of the 
pseudo-marginal is given by 


q N (x,w-,dy,du) := q(x, dy)Q V}N (du), 
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and, consequently, the acceptance probability can be expressed as 


aN(x, w\ y , u) := min 


n(dy)uq(y,dx) \ 
' n(dx)wq(x , dy) J 


(3) 


The pseudo-marginal algorithm defines a time-homogeneous Markov chain, with transition kernel Pn on 
the measurable space (X x W,B(X) x B(W)). A single draw from Pn(x,w, ■,■) is presented in Algo¬ 
rithm 1. 


Algorithm 1 Simulating from Pn(x,w, •, •) 

1. Sample Y ~ q(x, •). 

2. Draw U ~ Qy,jv( - )- 

3. With probability C(n(x,w;Y, U) defined in (3): 

return ( Y,U ), 
otherwise: 

return (x,w). 


Due to its exactness and straightforward implementation in many settings, the pseudo-marginal has 
gained recent interest and has been theoretically studied in some depth, see e.g. Andrieu and Roberts 
(2009), Andrieu and Vihola (2015), Andrieu and Vihola (2014), Sherlock et al. (2015), Girolami et al. 
(2013) and Maire et al. (2014). These studies typically compare the pseudo-marginal Markov chain with 
a “marginal” Markov chain, arising in the case where all the weights are almost surely equal to 1, and 
(3) is then the standard Metropolis-Hastings acceptance probability associated with the target density 
7r and the proposal q. 


1.2 Examples of pseudo-marginal algorithms 

A common source of intractability for n occurs when a latent variable Z on ( Z,B{Z )) is used to model 
observed data, as in hidden Markov models (HMMs) or mixture models. Although the density 7r(:r) 
cannot be computed, it can be approximated via importance sampling, using an appropriate auxiliary 
distribution, say v x . Here, appropriate means tt x -C v x , where n x denotes the conditional distribution of 
Z given X = x. Therefore, for this setting, the weights are given by 


N 




k— 1 


( 4 * } ) 

( 4 fc) ) 


where 


{ 4 fc) } 


i.i.d. 


ke{l,...,N} 




which motivates the following generic form when using averages of unbiased estimators 


N 




k=l 


where {\\f ^ (/,(•)..% 



(4) 


It is clear that (4) describes only a special case of (2). Nevertheless, we will pay special attention to the 
former throughout the article. For similar settings to (4) see Andrieu and Roberts (2009). 

Since (2) is more general, it allows W Xi n to be any random variable with expectation 1. Sequential 
Monte Carlo (SMC) methods involve the simulation of a system of some number of particles, and provide 
unbiased estimates of likelihoods associated with HMMs (see Del Moral 2004, Proposition 7.4.1 or Pitt 
et al. 2012) irrespective of the size of the particle system. Consider the model given by Figure 1. The 
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Figure 1: Hidden Markov Model. 


random variables {X t } t _ {) form a time-homogeneous Markov chain with transition fg(-\xt~i) that depends 
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on a set of parameters 9. The observed random variables {Y t }^ =1 are conditionally independent given 
the unobserved {X t }J =1 and are distributed according to ge(-\xt), which also may depend on 9. The 
likelihood function for 9 is given by 


1(9; yi, , y T ) ~ E/ e 


HMx t ) 


where Eg denotes expectation w.r.t. the d-dependent law of {X^J^, and we assume for simplicity that 
the initial value Xg = Xq is known. If we denote by In(9;u i, ... ,ut) the unbiased SMC estimator of 
1(9; yi,..., yrp) based on N particles, we can then define 


W Sj N 


In(8;v i, ■ • ■ i ut) 

i(0;yi,---,yT) 


and (2) is satisfied but (4) is not. The resulting pseudo-marginal algorithm is developed and discussed 
in detail in Andrieu et al. (2010), where it and related algorithms are referred to as particle MCMC 
methods. 


1.3 The noisy algorithm 

Although the pseudo-marginal has the desirable property of exactness, it can suffer from “sticky” be¬ 
haviour, exhibiting poor mixing and slow convergence towards the target distribution (Andrieu and 
Roberts 2009 and Lee and Latuszynski 2014). The cause for this is well-known to be related with the 
value of the ratio between W y ^ and W x> n at a particular iteration. Heuristically, when the value of 
the current weight (w in (3)) is large, proposed moves can have a low probability of acceptance. As 
a consequence, the resulting chain can get “stuck” and may not move after a considerable number of 
iterations. 

In order to overcome this issue, a subtly different algorithm is performed in some practical problems 
(see, e.g., McKinley et ah, 2014). The basic idea is to refresh, independently from the past, the value 
of the current weight at every iteration. The ratio of the weights between W V} n and W X} n still plays 
an important role in this alternative algorithm, but here refreshing W x n at every iteration can improve 
mixing and the rate of convergence. 

This alternative algorithm is commonly known as Monte Carlo within Metropolis (MCWM), as in 
O’Neill et al. (2000), Beaumont (2003) or Andrieu and Roberts (2009), since typically the weights are 
Monte Carlo estimates as in (4). From this point onwards it will be referred as the noisy MH algorithm 
or simply the noisy algorithm to emphasize that our main assumption is (2). Due to independence from 
previous iterations while sampling W x ,n and W y ^, the noisy algorithm also defines a time-homogeneous 
Markov chain with transition kernel P/v, but on the measurable space (X,B(X)). A single draw from 
Pj\r(x,-) is presented in Algorithm 2, and it is clear that we restrict our attention to strictly positive 
weights because the algorithm is not well-defined when both W v> n and W x> n are equal to 0. 


Algorithm 2 Simulating from Pn(x, •) 

1. Sample Y ~ q(x, •). 

2. Draw W ~ Q x ,n( •) and U ~ Qy,n(-), independently. 

3. With probability un(x,W;Y, U) defined in (3): 

return Y, 
otherwise: 
return x. 


Even though these algorithms differ only slightly, the related chains have very different properties. 
In Algorithm 2, the value w is generated at every iteration whereas in Algorithm 1, it is treated as an 
input. As a consequence, Algorithm 1 produces a chain on (X x W,B(X) x B(W)) contrasting with a 
chain from Algorithm 2 taking values on (X, B(X)). However, the noisy chain is not invariant under it 
and it is not reversible in general. Moreover, it may not even have an invariant distribution as shown by 
some examples in Section 2. 
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Figure 2: Estimated densities using the noisy chain with 100,000 iterations for N = 10 (left), N = 100 
(central) and N = 1,000 (right). 


From O’Neill et al. (2000) and Fernandez-Villaverde and Rubio-Ramirez (2007), it is evident that 
the implementation of the noisy algorithm goes back even before the appearance of the pseudo-marginal, 
the latter initially conceptualised as Grouped Independence Metropolis-Hastings (GIMH) in Beaumont 
(2003). Theoretical properties, however, of the noisy algorithm have mainly been studied in tandem with 
the pseudo-marginal by Beaumont (2003), Andrieu and Roberts (2009) and more recently by Alquier 
et al. (2014). 


1.4 Objectives of the article 

The objectives of this article can be illustrated using a simple example. Let A/"( j/z, a 2 ) denote a univariate 
Gaussian distribution with mean p and variance a 2 and 7r(-) = Af(- 10,1) be a standard normal distribution. 
Let the weights W X: jv be as in (4) with 


Qx {-) = log A/" 


2 a ,a 


2 ~ 2 ' and a 2 := 5, 


where logA/"(j/x, cr 2 ) denotes a log-normal distribution of parameters /i and a 2 . In addition, let the 
proposal q be random walk given by q(x, •) = M {-\x, 4). For this example, Figure 2 shows the estimated 
densities using the noisy chain for different values of N. It appears that the noisy chain has an invariant 
distribution, and as N increases it seems to approach the desired target 7r. Our objectives here are to 
answer the following types of questions about the noisy algorithm in general: 

1. Does an invariant distribution exist, at least for N large enough? 

2. Does the noisy Markov chain behave like the marginal chain for sufficiently large N? 

3. Does the invariant distribution, if it exists, converge to 7r as N increases? 

We will see that the answer to the first two questions is negative in general. However, all three questions 
can be answered positively when the marginal chain is geometrically ergodic and the distributions of the 
weights satisfy additional assumptions. 


1.5 Marginal chains and geometric ergodicity 

In order to formalise our analysis, let P denote the Markov transition kernel of a standard MH chain 
on (A,£>(A)), targeting 7r with proposal q. We will refer to this chain and this algorithm using the 
term marginal (as in Andrieu and Roberts 2009 and Andrieu and Vihola 2015), which can be seen as 
an idealised version for which the noisy chain and corresponding algorithm are simple approximations. 
Therefore 


P(x, dy) := a{x, y)q{x, dy) + 6 x (dy)p{x), 
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where a is the MH acceptance probability and p is the rejection probability, given by 


a(x, y ) := min 


Ady)q(y,dx) \ 
’ tv (dx)q(x,dy) J 


and p(x) := 1 — / a(x,y)q(x,dy). 

Jx 


( 5 ) 


Similarly, for the transition kernel Pn of the noisy chain, moves are proposed according to q but are 
accepted using dpf (as in (3)) instead of a, once values for W x> n and W Vt N are sampled. In order to 
distinguish the acceptance probabilities between the noisy and the pseudo-marginal processes, despite 
being the same after sampling values for the weights, define 


&n(x, y) ■■= ^Q XtN ®Q y , N otN {x, W X>N ; y, W y , N ). (6) 

Here djv is the expectation of a randomised acceptance probability, which permits defining the transition 
kernel of the noisy chain by 


Pn(x, dy) := a N {x, y)q{x, dy ) + S x (dy)p N (x), 
where pat is the noisy rejection probability given by 


Pn(x) := 1 — / a N (x,y)q{x,dy). (7) 

Jx 

The noisy kernel Pjy is just a perturbed version of P, involving a ratio of weights in the noisy acceptance 
probability & n - In addition, when such weights are identically one, i.e. Qa;,jv({l}) = 1, the noisy chain 
reduces to the marginal chain, whereas the pseudo-marginal becomes the marginal chain with an extra 
component always equal to 1. 

So far, the terms slow convergence and “sticky” behaviour have been used in a relative vague sense. A 
powerful characterisation of the behaviour of a Markov chain is provided by geometric ergodicity, defined 
below. Geometrically ergodic Markov chains have a limiting invariant probability distribution, which 
they converge towards geometrically fast in total variation (Meyn and Tweedie, 2009). For any Markov 
kernel K : X x B(X) —> [0,1], let K n be the n-step transition kernel, which is given by 


K n (x, •) := f K n 1 (x,dz)K(z,-) i for n > 2. 

Jx 

Definition 1.1 (Geometric ergodicity). A p-irreducible and aperiodic Markov chain $ := ( < f > ,)i>o on 
a measurable space (X,J3(X)), with transition kernel P and invariant distribution 7r, is geometrically 
ergodic if there exists a finite function V > 1 and constants r < 1, R < oo such that 


||P ra (a;, •) — 7t(-)||tv < RV(x)r n , for x £ X. 


Here, || • || tv denotes the total variation norm given by 


M\tv = X sup 
2 lsl<l 


d-{dy)g(y) 


sup p{A), 
AgB(X) 


( 8 ) 


where p is any signed measure. 

Geometric ergodicity does not necessarily provide fast convergence in an absolute sense. For instance, 
consider cases where r, or R , from Definition 1.1 are extremely close to one, or very large respectively. 
Then the decay of the total variation distance, though geometric, is not particularly fast (see Roberts 
and Rosenthal 2004 for some examples). 

Nevertheless, geometric ergodicity is a useful tool when analysing non-reversible Markov chains as 
will become apparent in the noisy chain case. Moreover, in practice one is often interested in estimating 
E,r [/(A)] for some function f : X —¥ R, which is done by using ergodic averages of the form 

^ ra+n 

e n ,m(f)~- y for m, n> 0. 

i=m +1 

In this case, geometric ergodicity is a desirable property since it can guarantee the existence of a central 
limit theorem (CLT) for e„ im (/), see Chan and Geyer (1994), Roberts and Rosenthal (1997) and Roberts 
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and Rosenthal (2004) for a more general review. Also, its importance is related with the construction of 
consistent estimators of the corresponding asymptotic variance in the CLT, as in Flegal and Jones (2010). 

As noted in Andrieu and Roberts (2009), if the weights {W Xj n} xN are not essentially bounded 
then the pseudo-marginal chain cannot be geometrically ergodic; in such cases the “stickiness” may be 
more evident. In addition, under mild assumptions (in particular, that Pm has a left spectral gap), 
from Andrieu and Vihola (2015, Proposition 10) and Lee and Latuszyriski (2014), a sufficient but not 
necessary condition ensuring the pseudo-marginal inherits geometric ergodicity from the marginal, is that 
the weights are uniformly bounded. This certainly imposes a tight restriction in many practical problems. 

The analyses in Andrieu and Roberts (2009) and Alquier et al. (2014) mainly study the noisy algo¬ 
rithm in the case where the marginal Markov chain is uniformly ergodic, i.e. when it satisfies (8) with 
sup^g^ V(x) < oo. However, there are many Metropolis-Hastings Markov chains for statistical estimation 
that cannot be uniformly ergodic, e.g. random walk Metropolis chains when 7r is not compactly sup¬ 
ported. Our focus is therefore on inheritance of geometric ergodicity by the noisy chain, complementing 
existing results for the pseudo-marginal chain. 

1.6 Outline of the paper 

In Section 2, some simple examples are presented for which the noisy chain is positive recurrent, so it 
has an invariant probability distribution. This is perhaps the weakest stability property that one would 
expect a Monte Carlo Markov chain to have. However, other fairly surprising examples are presented 
for which the noisy Markov chain is transient even though the marginal and pseudo-marginal chains are 
geometrically ergodic. Section 3 is dedicated to inheritance of geometric ergodicity from the marginal 
chain, where two different sets of sufficient conditions are given and are further analysed in the context 
of arithmetic averages given by (4). Once geometric ergodicity is attained, it guarantees the existence of 
an invariant distribution n tm for the noisy chain. Under the same sets of conditions, we show in Section 4 
that 7 tm and 7r can be made arbitrarily close in total variation as N increases. Moreover, explicit rates of 
convergence are possible to obtain in principle, when the weights arise from an arithmetic average setting 
as in (4). 

2 Motivating examples 

2.1 Homogeneous weights with a random walk proposal 

Assume a log-concave target distribution ir on the positive integers, whose density with respect to the 
counting measure is given by 


7t(to) oc exp {— /i(m)} l meN +, 


where h : N + —► K. is a convex function. In addition, let the proposal distribution be a symmetric random 
walk on the integers, i.e. 

q(m, {m + 1}) = - = q{m 1 {m — 1}), for m £ Z. (9) 


From Mengersen and Tweedie (1996), it can be seen that the marginal chain is geometrically ergodic. 

Now, assume the distribution of the weights {W m> N} m N is homogeneous with respect to the state 
space, meaning 


W mt N = Wn ~ Qn {-), for all m £ N + . 
In addition, assume Wm > 0 Qm- a.s., then for m > 2 


( 10 ) 


Pat(to, {to - 1}) = iE QiV0Qw 


P/v(m, {to + 1}) = -E Qn0Qi , 


min < 1 


exp{h(m)} h / Ar' ) 


’ exp {h(m - 1)} 
min 1 1 . <^W m) }.. x <’ 


’ exp{h(m + 1)} 


and 


where 


{iv^l 

l J ke{ 1 , 2 } 


Qn{-)- 


For this particular class of weights and using the fact that h is convex, the noisy chain is geometrically 
ergodic, implying the existence of an invariant probability distribution. 
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Figure 3: Last 20,000 iterations of the marginal algorithm for the autoregressive parameter a (top). 
Estimated autocorrelation function of the corresponding marginal chain (bottom). The mean acceptance 
probability was 0.256. 

Proposition 2.1. Consider a log-concave target density on the positive integers and a proposal density 
as in (9). In addition, let the distribution of the weights be homogeneous as in (10). Then, the chain 
generated by the noisy kernel Pn is geometrically ergodic. 

It is worth noting that the distribution of the weights, though homogeneous with respect to the 
state space, can be taken arbitrarily, as long as the weights are positive. Homogeneity ensures that the 
distribution of the ratio of such weights is not concentrated near 0, due to its symmetry around one, i.e. 
for z > 0 

where (lF^ fc) } Qn{-)- 

l J ke{ 1,2} 

In contrast, when the support of the distribution Qn is unbounded, the corresponding pseudo-marginal 
chain cannot be geometrically ergodic. 

2.2 Particle MCMC 

More complex examples arise when using particle MCMC methods, for which noisy versions can also be 
performed. They may prove to be useful in some inference problems. Consider again the hidden Markov 
model given by Figure 1. As before, set X 0 = Xq and let 

9 = {x 0 , a, a 2 x , ay} , 

fe (’l^t-i) = A/" (-\aX t -i,ax) and 

ge{-\X t )=M(-\X t ,a 2 Y ). 

Therefore, once a prior distribution for 0 is specified, p(-) say, the aim is to conduct Bayesian inference 
on the posterior distribution 

7r(%i,..., y T ) OC p(6)l(9] 2/1,..., y T )• 

In this particular setting, the posterior distribution is tractable. This will allows us to compare the 
results obtained from the exact and noisy versions, both relying on the SMC estimator /jv(0; y ±,..., yx) of 
the likelihood. Using a uniform prior for the parameters and a random walk proposal, Figure 3 shows the 
run and autocorrelation function (acf) for the autoregressive parameter a of the marginal chain. Similarly, 
Figure 4 shows the corresponding run and acf for both the pseudo-marginal and the noisy chain when 
N = 250. It is noticeable how the pseudo-marginal gets “stuck”, resulting in a lower acceptance than the 
marginal and noisy chains. In addition, the acf of the noisy chain seems to decay faster than that of the 
pseudo-marginal chain. 

Finally, Figure 5 and Figure 6 show the estimated posterior densities for the parameters when N = 250 
and N = 750, respectively. There, the trade-off between the pseudo-marginal and the noisy algorithm is 
noticeable. For lower values of N, the pseudo-marginal will require more iterations due to the slow mixing, 
whereas the noisy converges faster towards an unknown noisy invariant distribution. By increasing N, 
the mixing in the pseudo-marginal improves and the noisy invariant approaches the true posterior. 
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Figure 4: Last 20, 000 iterations of the pseudo-marginal (top left) and noisy (bottom left) algorithms, for 
the autoregressive parameter a when N = 250. Estimated autocorrelation functions of the corresponding 
pseudo-marginal (top right) and noisy (bottom right) chains. The mean acceptance probabilities were 
0.104 for the pseudo-marginal and 0.283 for the noisy chain. 
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Figure 5: Estimated densities using the marginal, pseudo-marginal and noisy chains for the 4 parameters 
when N = 250. Vertical lines indicate the real values. 
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Figure 6: Estimated densities using the marginal, pseudo-marginal and noisy chains for the 4 parameters, 
when N = 750. Vertical lines indicate the real values. 
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2.3 Transient noisy chain with homogeneous weights 


In contrast with example in Section 2.1, this one shows that the noisy algorithm can produce a transient 
chain even in simple settings. Let 7r be a geometric distribution on the positive integers, whose density 
with respect to the counting measure is given by 


7r(m) 



m 


( 11 ) 


In addition, assume the proposal distribution is a simple random walk on the integers, i.e. 


q(m , {m + 1}) = 0 = 1 — q[in , {to — 1}), for m £ Z. (12) 

where 0 £ (0,1). Under these assumptions, the marginal chain is geometrically ergodic, see Proposi¬ 
tion A.l in Appendix A. 

Consider N = 1 and as in Section 2.1, let the distribution of weights be homogeneous and given by 


W = (b — e)Ber(s) + e, for b > 1 and e £ (0,1), (13) 

where Ber(s) denotes a Bernoulli random variable of parameter s £ (0,1). There exists a relationship 
between s, b and e that guarantees the expectation of the weights is identically one. The following 
proposition, proven in Appendix A by taking 0 > 1/2, shows that the resulting noisy chain can be 
transient for certain values of b, e and 0. 

Proposition 2.2. Consider a geometric target density as in (11) and a proposal density as in (12). In 
addition, let the weights when N = 1 be given by (13). Then, for some b, £ and 0 the chain generated by 
the noisy kernel Pn =i is transient. 

In contrast, since the weights are uniformly bounded by b, the pseudo-marginal chain inherits geo¬ 
metric ergodicity for any 6 , b and e. The left plot in Figure 7 shows an example. We will discuss the 
behaviour of this example as N increases in Section 3.4 . 


2.4 Transient noisy chain with non-homogeneous weights 

One could argue that the transient behaviour of the previous example is related to the large value of 0 in 
the proposal distribution. However, as shown here, for any value of 0 £ (0,1) one can construct weights 
satisfying (2) for which the noisy chain is transient. With the same assumptions as in the example in 
Section 2.3, except that now the distribution of weights is not homogeneous but given by 

W m ,i = (b - e m )Ber(s m ) + £ m , for b > 1 and e m = (mod 3))) , (14) 

the noisy chain will be transient for b large enough. The proof can be found in Appendix A. 

Proposition 2.3. Consider a geometric target density as in (11) and a proposal density as in (12). In 
addition, let the weights when N = 1 be given by (14). Then, for any 0 £ (0,1) there exists some b > 1 
such that the chain generated by the noisy kernel Pn= i is transient. 

The reason for this becomes apparent when looking at the behaviour of the ratios of weights. Even 
though £ m —^ 0 as m —^ oo, the non-monotonic behaviour of the sequence implies 


and 


gm-i f O (to 2 ) 

£-m ( O (TO ) 


m (mod 3) = 0, 
to (mod 3) £ {1, 2}, 


£m+1 | O (m 2 ) 

£m 1 O (to) 


to (mod 3) = 2, 
to (mod 3) £ {0,1}. 


Hence, the ratio of the weights can become arbitrarily large or arbitrarily close to zero with a non- 
negligible probability. This allows the algorithm to accept moves to the right more often, if to is large 
enough. Once again, the pseudo-marginal chain inherits the geometrically ergodic property from the 
marginal. See the central and right plots of Figure 7 for two examples using different proposals. Again, 
we will come back to this example in Section 3.4, where we look at the behaviour of the associated noisy 
chain as N increases. 
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Figure 7: Runs of the marginal, pseudo-marginal and noisy chains. Left plot shows example in Section 2.3, 
where 8 = 0.75, e = 2 — \/3 and b = 2ej^-g. Central and right plots show example in Section 2.4, where 

8 = 0.5 and 8 = 0.25 respectively, with e m = ( mod and 6 = 3+ ("4+) 3 - 

3 Inheritance of ergodic properties 

The inheritance of various ergodic properties of the marginal chain by pseudo-marginal Markov chains 
has been established using techniques that are powerful but suitable only for reversible Markov chains 
(see, e.g. Andrieu and Vihola, 2015). Since the noisy Markov chains treated here can be non-reversible, a 
suitable tool for establishing geometric ergodicity is the use of Foster-Lyapunov functions, via geometric 
drift towards a small set. 

Definition 3.1 (Small set). Let P be the transition kernel of a Markov chain <&. A subset CCA is 
small if there exists a positive integer no, e > 0 and a probability measure v{-) on (X,B(X)) such that 
the following minorisation condition holds 

P n °(x, ■) > ei'(-), for x £ C. (15) 

The following theorem, which is immediate from combining Roberts and Rosenthal (1997, Proposi¬ 
tion 2.1) and Meyn and Tweedie (2009, Theorem 15.0.1), establishes the equivalence between geometric 
ergodicity and a geometric drift condition. For any kernel K : X x B(X) —>• [0,1], let 

KV(x) := f K{x,dz)V{z). 

Jx 

Theorem 3.1. Suppose that <F is a cf-irreducible and aperiodic Markov chain with transition kernel P 
and invariant distribution i r. Then, the following statements are equivalent: 

i. There exists a small set C, constants A < 1 and b < oo, and a function V > 1 finite for some 
Xo € X satisfying the geometric drift condition 

PV(x) < XV(x) + bt{ x( z C }, for x £ X. (16) 

ii. The chain is ir-a.e. geometrically ergodic, meaning that for jr-a.e. x £ X it satisfies (8) for some 
V > 1 (which can be taken as in (i)) and constants r < 1, R < oo. 

From this point onwards, it is assumed that the marginal and noisy chains are ^-irreducible and 
aperiodic. In addition, for many of the following results, it is required that 

(PI) The marginal chain is geometrically ergodic, implying its kernel P satisfies the geometric 
drift condition in (16) for some constants A < 1 and 6 < oo, some function V > 1 and a 
small set CCA'. 

3.1 Conditions involving a negative moment 

From the examples of the previous section, it is clear that the weights plays a fundamental role in the 
behaviour of the noisy chain. The following theorem states that the noisy chain will inherit geometric 
ergodicity from the marginal under some conditions on the weights involving a strengthened version of 
the Law of Large Numbers and convergence of negative moments. 
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(Wl) For any <5 > 0, the weights {W x ^} x N satisfy 

lim sup ¥ QmiN [I W X}N - l| > 5] = 0. 

iV->oo X ^x 


(W2) The weights {W X:N } x N satisfy 

lim supE Q 

Af->oo xeX 



= 1. 


Theorem 3.2. Assume (PI), (Wl) and (W2). Then, there exists No £ N + such that for all N > No, 
the noisy chain with transition kernel Pn is geometrically ergodic. 

The above result is obtained by controlling the dissimilarity of the marginal and noisy kernels. This is 
done by looking at the corresponding rejection and acceptance probabilities. The proofs of the following 
lemmas appear in Appendix A. 

Lemma 3.1. For any S > 0 


■ Q z, N ®Q x, N 


w ZiN 

Wj,jv 


< 1-6 


< 2 sup P Qx<n 

xGX 


' 

<51 

w XtN - 1 

> - 

- 2 


Lemma 3.2. Let p(x) and Pn{x) be the rejection probabilities as defined in (5) and (7) respectively. 
Then, for any 8 > 0 


p N (x) - p(x) < 8 + 2 sup Pq x „ 

xGX 




Lemma 3.3. Let a(x,y ) and ctN(x,y) be the acceptance probabilities as defined in (5) and (6) respec¬ 
tively. Then, 


a N (x,y) < a(x,y) E Qx JV 



Notice that (Wl) and (W2) allow control on the bounds in the above lemmas. While Lemma 3.2 
provides a bound for the difference of the rejection probabilities, Lemma 3.3 gives one for the ratio of the 
acceptance probabilities. The proof of Theorem 3.2 is now presented. 

Proof of Theorem 3.2. Since the marginal chain P is geometrically ergodic, it satisfies the geometric 
drift condition in (16) for some A < 1, b < oo, some function V > 1 and a small set CCA. Now, using 
the above lemmas 


P N V{x) - PV(x) = / q(x,dz)(a N (x,z)-a{x,z))V(z) + V(x)(p N (x)-p(x)) 


Jx 

< ( sup E 
\xex 


W~ 


,N 


-d 




PV(x) + ( 8 + 2 sup I 

xGX 


By (Wl) and (W2), for any e, 8 > 0 there exists N 0 £ N + such that 

6 


sup P 

xex 


W XtN - 1| > 


< - and sup E 
4 x£X 


w~h 


w x , N -1 > - 


— 1 < £, 


V{x). 


whenever N > Nq , implying 

P N V(x) < PV(x) + ePV{x) + (6 + |) V{x) 

< A ^1 + 8 + V(x) + b(l + e) 1 {xec}- 

Taking 8 = | and £ £ (0, the noisy chain Pn also satisfies a geometric drift condition for the same 
function V and small set C, completing the proof. □ 
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3.2 Conditions on the proposal distribution 

In this subsection a different bound for the acceptance probabilities is provided, which allows dropping 
assumption (W2) but imposes a different one on the proposal q instead. 

(PI*) (PI) holds and for the same drift function V in (PI) there exists K < oo such that 
the proposal kernel q satisfies 


qV(x) < KV(x), for x £ X. 


Theorem 3.3. Assume (PI*) and (Wl). Then, there exists Nq £ N + such that for all N > No, the 
noisy chain with transition kernel Pn is geometrically ergodic. 

In order to prove Theorem 3.3 the following lemma is required. Its proof can be found in Appendix A. 
In contrast with Lemma 3.3, this lemma provides a bound for the additive difference of the noisy and 
marginal acceptance probabilities. 

Lemma 3.4. Let a{x,y) and &n{ x,y) be the acceptance probabilities as defined in (5) and ( 6 ), respec¬ 
tively. Then, for any 77 > 0 


a N {x, y) - a(x, y) < 77 + 2 sup Pq^ n 

x£X 


W X , N - 1 


> 


2 ( 1 + 7 ?) 


Proof of Theorem 3.3. Using Lemma 3.2 and Lemma 3.4 with rj = 6 
P N V{x) - PV(x) 

= / q{x,dz){a N (x,z) - a(x,z))V(z)+ V(x)(p N (x) - p{x)) 
Jx 

> 


< ( 5 + 2 sup I 
xex 

< ( 5 + 2 sup I 
xex 


W X>N - 1 
W x< N - 1 


> 


2(1+<5)J 

5 


2 (1 + <5)_ 

By (Wl), there exists N 0 £ N + such that 


qV(x) + <5 + 2 sup I 

V xGX 

(■ qV{x) + V(x)). 


sup P 

xGX 


W X M - 1 


> 


2(1 + 5) 


e 

<4* 


W X M - 1 


5' 

> - 
“ 2 


V(x) 


whenever N > Nq. This implies 

PnV(x) < PV(x) + (<5 + |) (qV(x) + V(x )), 


and using (PI*) 


PnV(x) < (A + l)j V(x) + bt{ xe c}- 

Taking 6 = | and e £ ^0, , the noisy chain Pjv also satisfies a geometric drift condition for the same 

function V and small set C , completing the proof. □ 

In general, assumption (PI*) may be difficult to verify as one must identify a particular function V, 
but it is easily satisfied when restricting to log-Lipschitz targets and when using a random walk proposal 
of the form 


q(x,dy) = q(\\y - x\\)dy, (17) 

where || • || denotes the usual Euclidean distance. To see this the following assumption is required, which 
is a particular case of (PI) and is satisfied under some extra technical conditions (see, e.g., Roberts and 
Tweedie, 1996). 
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(PI**) X C ]R d . The target 7 r is log-Lipschitz, meaning that for some L > 0 

| log 7 r(z) — log 7 r(a:)| < L\\z — x||. 

(PI) holds taking the drift function V = 7 r _s , for any s £ (0,1). The proposal q is a 
random walk as in (17) satisfying 



exp{a||u||}g(||u||)du < oo, 


for some a > 0 . 

See Appendix A for a proof of the following proposition. 
Proposition 3.1. Assume (PI**) and (Wl). Then, (PI*) holds. 


3.3 Conditions for arithmetic averages 

In the particular setting where the weights are given by (4), sufficient conditions on these can be obtained 
to ensure geometric ergodicity is inherited by the noisy chain. For the simple case where the weights are 
homogeneous with respect to the state space (Wl) is automatically satisfied. In order to attain (W2), 
the existence of a negative moment for a single weight is required. See Appendix A for a proof of the 
following result. 

Proposition 3.2. Assume weights as in (4). J/Eq^ [W" 1 ] < oo then 


lim E 

N—>oo 


Qx,N 



= 1 . 


(18) 


For homogeneous weights, (18) implies (W2). When the weights are not homogeneous, stronger 
conditions are needed for (Wl) and (W2) to be satisfied. An appropriate first assumption is that the 
weights are uniformly integrable. 

(W3) The weights {W x } x satisfy 

lim supEq* [W x 1 { w x> K}] = 0 . 

k ^°°xEX 


The second condition imposes an additional assumption on the distribution of the weights { W x } near 0. 

(W4) There exists 7 £ (0,1) and constants M < 00 , (3 > 0 such that for w £ ( 0 , 7 ) the 
weights {W x } x satisfy 

sup P Qx [W x < w] < Mw 13 . 
xex 


These new conditions ensure (Wl) and (W2) are satisfied. 

Proposition 3.3. For weights as in (4), 

i. (WS) implies (Wl); 

ii. (Wl) and (W4) imply (W2). 

The following corollary is obtained as an immediate consequence of the above proposition, Theorem 3.2 
and Theorem 3.3. 

Corollary 3.1. Let the weights be as in (4). Assume (W3) and either 

i. (PI) and (W4); 

ii. (PI*). 

Then, there exists Nq £ N + such that for all N > Nq, the noisy chain with transition kernel Pn is 
geometrically ergodic. 
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The proof of Proposition 3.3 follows the statement of Lemma 3.5, whose proof can be found in 
Appendix A. This lemma allows us to characterise the distribution of W x> n near 0 assuming (W4) and 
also provides conditions for the existence and convergence of negative moments. 

Lemma 3.5. Let 7 £ (0,1) and p > 0. 

i. Suppose Z is a positive random variable, and assume that for z £ (0, 7 ) 

P [Z < z] < Mz a , where a > p,M < 00 . 

Then, 


E [Z~ p ] < PpM 1 


a — p 


ii. Suppose {Z i }^ =1 is a collection of positive and independent random variables, and assume that for 
each i £ {1,..., IV} and z £ (0, 7 ) 

P [Zi < z] < MiZ ai , where oti > 0, Mi < 00 . 

Then, for z £ (0, 7 ) 


N 


5>< 


1=1 


N 


< n M i* 


E n _ 
i=1 OLi 


i=1 


in. Let the weights be as in (4). If for some Nq G N + 


E, 


Qx,Nq 


w~ p No 


< 00 , 


then for any N > Nq 


E, 


Qx,L 


w- p N+ 1 


< 


w- p N 


iv. Assume (Wl) and let g : K + —> K be a function that is continuous at 1 and bounded on the interval 
[ 7 , 00 ). Then 

Jim sup E q [|g (W X}N ) - g (1) |lw I|N > 7 ] = 0. 

N^ooxex 

Proof of Proposition 3.3. Part (z) is a consequence of Chandra (1989, Theorem 1). Assuming (W3), it 
implies 


By Markov’s inequality 


lim sup E \W X n — 1 = 0. 


E|W x ,jv —1| >SP[\W XjN -l\ ><S] , 


and the result follows. 

To prove (zz), assume (W4) and by part (ii) of Lemma 3.5, for w £ ( 0 , 7 ) 

P [NW x , n < w] < M n w n0 . 

Take p > 1 and define N 0 := + 1, then using part (z) of Lemma 3.5 if N > N 0 


sup E 

xGX 


w: 


,N 


N 

< - \-pNM 

'■y P 


N 7 


N/3—p 


N/3 — p 


Hence, by Holder’s inequality 
E 




W7,«e(o, 7 ) 


< E 

< Te 


W x,N^-w x , N e(p,~t) 


W~ P N 


(P [W XtN < 7]) Pp 
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and applying part (in') of Lemma 3.5, for N > N 0 


E 


| W x,N ~ 1 | 1 ^, JV e(0,7) 


< ( E [w- p No ]) lp (P[W X , N < 7 ])*^ 


Therefore, 


supE |W jv - l|l^ !we(0i7 ) < supE 

xEX l j \x£X 




sup P [W X>N < 7 ] 

xGX 


Since 7 < 1 and by (Wl) 


implying 


lim sup P \W x> n <7] = 0 , 

N-> 00 xeX 


lim sup E 

JV->oo xeX 


l^x.JV l|lw7,Are(0,7) 


= 0 . 


(19) 


Now, for fixed 7 £ (0,1) the function g(x ) = x 1 is bounded and continuous on [ 7 , 00 ), implying by part 
(iv) of Lemma 3.5 


lim sup E 



- 1 


1 


W x ,N£[ , y,oo) 


= 0 . 


( 20 ) 


Finally, using (19) and (20) 


lim supEllFj, — 11 = 0 , 


and by the triangle inequality 


sup E| W jf 

x£X 


I > sup E 

W-'n 

xex 



the result follows. □ 

3.4 Remarks on results 

Equipped with these results, we return to the examples in Section 2.3 and Section 2.4. Even though the 
noisy chain can be transient in these examples, the behaviour is quite different when considering weights 
that are arithmetic averages of the form in (4). Since in both examples the weights are uniformly bounded 
by the constant 5, they immediately satisfy (Wl). Additionally, by Proposition 3.2, condition (W2) is 
satisfied for the example in Section 2.3. This is not the case for example in Section 2.4, but condition 
(PI*) is satisfied by taking V = ir~z. Therefore, applying Theorem 3.2 and Theorem 3.3 to examples in 
Section 2.3 and in Section 2.4 respectively, as N increases the corresponding chains will go from being 
transient to geometrically ergodic. 

Despite conditions (Wl) and (W2) guaranteeing the inheritance of geometric ergodicity for the noisy 
chain, they are not necessary. Consider a modification of the example in Section 2.3, where the weights 
are given by 


W mA = (b m - £ m )Ber(s m ) + e m , where b m > 1 and e m £ (0,1] for all m > 1. 

Again, there exists a relationship between the variables 6 m , £ m and s m for ensuring the expectation of 
the weights is equal to one. Let Bin (N, s ) denote a binomial distribution of parameters N £ N + and 
s £ (0,1). Then, in the arithmetic average context, W mj N becomes 

W m ,N = ^ Bin (TV, s m ) + e m , where b m > 1 and e m £ (0,1] for all m> 1. (21) 

For particular choices of the sequences {b m } meN+ and {e m } m6N +, the resulting noisy chain can be 
geometrically ergodic for all N > 1, even though neither (Wl) nor (W2) hold. 
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Proposition 3.4. Consider a geometric target density as in (11) and a proposal density as in (12). In 
addition, let the weights be as in (21) with b m —► oo, £ m —> 0 as m —> oo and 

lim £rn 1 = l, where l £ R + U {+oo} . 

m —>oo e m 

Then, the chain generated by the noisy kernel Pj\r is geometrically ergodic for any N £ N + . 

Finally, in many of the previous examples, increasing the value of N seems to improve the ergodic 
properties of the noisy chain. However, the geometric ergodicity property is not always inherited, no mat¬ 
ter how large N is taken. The following proposition shows an example rather similar to Proposition 3.4, 
but in which the ratio E ™~ 1 does not converge as m — > oo. 

Proposition 3.5. Consider a geometric target density as in (11) and a proposal density as in (12). In 
addition, let the weights be as in (21) with b m = m and 

£m — m -(3-(m ( mod 3 ))). 

Then, the chain generated by the noisy kernel Pjv is transient for any N £ N + . 


4 Convergence of the noisy invariant distribution 

So far the only concern has been whether the noisy chain inherits the geometric ergodicity property 
from the marginal chain. As an immediate consequence, geometric ergodicity guarantees the existence 
of an invariant probability distribution j r/v for Pjy, provided N is large enough. In addition, using the 
same conditions from Section 3, we can characterise and in some cases quantify the convergence in total 
variation of %fjv towards the desired target ir, as N —> oo. 

4.1 Convergence in total variation 

The following definition, taken from Roberts et al. (1998), characterises a class of kernels satisfying a 
geometric drift condition as in (16) for the same V , C, A and b. 

Definition 4.1 (Simultaneous geometric ergodicity). A class of Markov chain kernels {Pfc} fceJC sinnil- 
taneously geometrically ergodic if there exists a class of probability measures {^fc} fcgK , a measurable set 
CCA, a real valued measurable function V > 1, a positive integer n o and positive constants e, A, b such 
that for each k £ 1C: 

i. C is small for P&, with Pff°(x, •) > ei '&(•) for all x £ C; 

ii. the chain I\ satisfies the geometric drift condition in (16) with drift function V, small set C and 
constants A and b. 

Provided N is large, the noisy kernels {Pv+fc}fc>o together with the marginal P will be simultaneous 
geometrically ergodic. This will allow the use of coupling arguments for ensuring tvn and 7r get arbitrarily 
close in total variation. The main additional assumption is 

(P2) For some e > 0, some probability measure v(-) on (A,H(A)) and some subset CCA, 
the marginal acceptance probability a and the proposal kernel q satisfy 

a(x, y)q(x, dy) > £v(dy), for x £ C. 

Remark 4.1. (P2) ensures the marginal chain satisfies the minorisation condition in (15), purely attained 
by the sub-kernel a(x,y)q(x,dy). This occurs under fairly mild assumptions (see, e.g., Roberts and 
Tweedie, 1996, Theorem 2.2). 

Theorem 4.1. Assume (PI), (P2), (Wl) and (W2). Alternatively, assume (PI*), (P2) and (Wl). 
Then, 

i. there exists No £ N + such that the class of kernels |p, Pjv 0 , P/v 0 +1, ■ j is simultaneously geomet¬ 

rically ergodic; 
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ii. for all x £ A, limjv— kx> ||Pjv(z, ■) - P(x, -)|| tv = 0; 

Hi. limjv-^oo ||ttjv(-) - tt(-)II tv = 0. 

Part (in) of the above theorem is mainly a consequence of Roberts et al. (1998, Theorem 9) when 
parts (i) and (ii) hold. Indeed, by the triangle inequality, 

117Tjv(■) ^ tt(-)IItv < || Pn(x, •) - ttn(-)\\tv + || P n (x, ■) - 7r(-)||rv + || Pn(x, ■) - P n ( x, -)ll TV- (22) 

Provided N > Nq, the first two terms in (22) can be made arbitrarily small by increasing n. In addition, 
due to the simultaneous geometrically ergodic property, the first term in (22) is uniformly controlled 
regardless the value of N. Finally, using an inductive argument, part (ii) implies that for all x £ X and 
all n £ N + 


Jim \\PZ(x,-)-P n (x,-)\\ T v = 0. 

N —>oo 

Proof of Theorem f.l. From the proofs of Theorem 3.2 and Theorem 3.3, there exists £ N + such that 
the class of kernels |p, Pn 2 , Pjv 2 +i, • ■ ■ | satisfies condition (ii) in Definition 4.1 for the same function V, 
small set C and constants A n 2 ^n 2 - Respecting (i), for any i5 € (0,1) 


P n (x,A)> / a N (x,z)q(x,dz) 


> / E 
J A 


I , W Z , N 

mm < 1,- 


w x 


N 


>(1 -S) / 1- 




x,N 


a(x, z)q(x, dz) 

a(x, z)q(x, dz). 


w -*<is 


Then, by Lemma 3.1 


Pn(x, A) > (1 — 5) ( 1 — 2 sup I 

xGX 


W X M - 1 


By (Wl), there exists Ni £ N + such that for N > Ni 


sup P 

xGX 


' 

s 

w XtN - 1 

> - 
- 2 


6' 

> - 
- 2 


“ 2’ 


a(x, z)q(x , dz). 


giving 


Pn(x, A) > (1 — S) 2 / a(x, z)q(x,dz). 


Due to (P2), 


Pn(x,A) > (1 — 5) 2 ev(A ), for x £ C. 


Finally, take No = max{IVi, IV 2 } implying (i). 

To prove (ii) apply Lemma 3.2 and Lemma 3.4 to get 

sup {Pjv (x, A) - P(x, A )} < (?? + 2 sup Pq^ j 

<- > \ xGX 


AgB(X) 


W X>N - 1 


> 


2 a+»7)j ;**(*) 


sup q(x, A) 


+ (p~n(x) - p(x)) sup t xeA 
A£B(X) 


< [rj + 2 sup P QxJ , 
xex 


W XtN - 1 


> 


2(1+7?)] 


+ <5 + 2 sup P q* 

xex 


W X: n — 1 > r 


(23) 


Finally, taking IV —> oo and by (Wl) 

lim sup \Pn(x,A) — P(x,A)\ < r] + 5. 

N ^°° AeB(X) *■ J 


The result follows since p and (5 can be taken arbitrarily small. 

For (Hi), see Theorem 9 in Roberts et al. (1998) for a detailed proof. 


□ 
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4.2 Rate of convergence 

Let ($^)„>o denote the noisy chain and ('ly, ) n>0 the marginal chain, which move according to the 
kernels Pjv and P, respectively. Let c x := 1 — ||P/v(x, •) — P{x, -)||tv, using notions of maximal coupling 
for random variables defined on a Polish space (see Lindvall 2002 and Thorisson 2013). In particular, 
there exists a probability measure v,,.{■) such that 

P{x,-)>C X U X {-) and P N {x, •) > c x u x {-). 

Let c := inf x£ ^ c x , define a coupling in the following way 

• If = & n -i = U , w ith probability c draw $ n ~ v y {-) and set = $ n . Otherwise, draw 

independently ~ R(y, •) and ~ Rn(v, •)> where 

R(y, •) : = (1 - c) _1 (P(y, •) - cv y (-)) and 

Rn(v, •) := (1 - c) _1 (P N (y, •) - «/„(•)) ■ 


If i ± $ n -i, draw independently 4> n ~ P(y, •) and ~ Pjv(y, •)• 


Since 


K P = $0 = * 




'i>„ 7^ SnK-l = *n-l, < = $0 = * + P C-l P *n-l|*0 = $0 = * 


<1-C+l 


= $o = x 


and noting 


$1 7^ $i|$o = $o = a: < sup ||Pjv(ac,-) - P(x,-)\\ T v 

J a:eV 

= 1 - c, 


an induction argument can be applied to obtain 


$n7^$n|$0 =$ 0 =Z < nsup \\P N (x, •) ~ P(x, -)||tV- 
J xex 

Therefore, using the coupling inequality, the third term in (22) can be bounded by 

II Pn(x, ■) - P n (x, Ollrv < P [*% P =$0 = X 

< nsup ||Pjv(x, •) — P{x, •) ||tv ■ 

xGX 


(24) 


On the other hand, using the simultaneous geometric ergodicity of the kernels and provided N is 
large enough, the noisy and marginal kernels will each satisfy a geometric drift condition as in (16) with 
a common drift function V > 1, small set C and constants A, b. Therefore, by Theorem 3.1, there exist 
R > 0, and r < 1 such that 

||P?H*,-) - ttjv(-)IItv < RV(x)T n and ||P n (x, •) - tt(-)||tv < RV(x)r n . (25) 

Explicit values for R and r are in principle possible, as done in Rosenthal (1995) and Meyn and Tweedie 
(1994). For simplicity assume inf xe ^ V[x) = 1, then combining (24) and (25) in (22), for all n € N + 

||ttjv(-) - 7t(-)||tv < 2Pr n + n sup ||Pjv(x, •) - P(x, -)||tv- (26) 

xGX 


So, if an analytic expression in terms of N is available for the second term on the right hand side of (26), 
it will be possible to obtain an explicit rate of convergence for j tjv and n. 
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Theorem 4.2. Assume (PI), (P2), (Wl) and (W2). Alternatively, assume (PI*), (P2) and (Wl). In 
addition, suppose 

SUp \\Pn(x, ■) - P(X, -)|| TV < -TTpT, 
xex r(N ) 

where r : N + K + and limjv-nx) r(N) = +oo. Then, there exists D > 0 and Nq £ N + such that for all 
N>N 0 , 

log (r(N)) 


||ttjv(-) ~ 7r(-)ll tv < D- 


<N) 


Proof. Let R > 0, r £ (0,1) and r > 0. Pick r large enough, such that 

log (2i?r log (t -1 )) > 1, 

then the convex function / : [1, oo) —> R + where 

f(s) = 2Rr s + S , 
r 

is minimised at 

log (2 Rr log (t -1 )) 

S * = log(r-i) ' 

Restricting the domain of / to the positive integers and due to convexity, it is then minimised at either 

ni = [s*J or n 2 = |~s*"|. 

In any case 

min{/(m),/(n 2 )} < /(s» + 1), 

_ 1 A t + log (2Rr log (r -1 )) ^ 


log (t- 1 ) 


Finally take N large enough such that 

log (2 Rr(N) log (t -1 )) >1, 

and from (26) 

IIttjv(-) - 7t(-)IItv < min{/(m),/(n 2 )} 


1 ( T + log (2Rr(N) log (t- 1 )) 

“ r(N ) ^ logCr^ 1 ) 

_ n ( log {r{N)) \ 
l r(N) )> 


obtaining the result. 


□ 


Moreover, when the weights are expressed in terms of arithmetic averages as in (4), an explicit 
expression for r(N) can be obtained whenever there exists a uniformly bounded moment. This is a 
slightly stronger assumption than (W3). 

(W5) There exists k > 0, such that the weights { W x } x satisfy 

supEg, [Wl +k ] < oo. 

x£X 

Proposition 4.1. Assume (PI), (P2), (Wf) and (W5). Alternatively, assume (PI*), (P2) and (W5). 
Then, there exists > 0 and Nq £ N + such that for all N > Nq, 

||ttjv(-) — 7t(-)I|tv < Dk - r ^— r -- 

N i_ 2 + fc 

If in addition (W5) holds for all k > 0, then for any e £ (0,1) there will exist D e > 0 and Nq £ N + such 
that for all N > N 0 , 

log (TV) 


l^jv(-) — 7r(-)ll tv < De 


N ' 
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5 Discussion 


In this article, fundamental stability properties of the noisy algorithm have been explored. The noisy 
Markov kernels considered are perturbed Metropolis-Hastings kernels defined by a collection of state- 
dependent distributions for non-negative weights all with expectation 1. The general results do not 
assume a specific form for these weights, which can be simple arithmetic averages or more complex 
random variables. The former may arise when unbiased importance sampling estimates of a target 
density are used, while the latter may arise when such densities are estimated unbiasedly using a particle 
filter. 

Two different sets of sufficient conditions were provided under which the noisy chain inherits geometric 
ergodicity from the marginal chain. The first pair of conditions, (Wl) and (W2), involve a stronger version 
of the Law of Large Numbers for the weights and uniform convergence of the first negative moment, 
respectively. For the second set, (Wl) is still required but (W2) can be replaced with (PI*), which 
imposes a condition on the proposal distribution. These conditions also imply simultaneous geometric 
ergodicity of a sequence of noisy Markov kernels together with the marginal Markov kernel, which then 
ensures that the noisy invariant 7f/v converges to n in total variation as N increases. Moreover, an 
explicit bound for the rate of convergence between f t/v and ir is possible whenever an explicit bound (that 
is uniform in x) is available for the convergence between Pn(x, •) and P{x, •). 

When weights are arithmetic averages as in (4), specific conditions were given for inheriting geometric 
ergodicity from the corresponding marginal chain. The uniform integrability condition in (W3) ensures 
that (Wl) is satisfied, whereas (W4) is essential for satisfying (W2). Regarding the noisy invariant 
distribution if at, (W5), which is slightly stronger than (W3), leads to an explicit bound on the rate of 
convergence of this distribution to 7 r. 

The noisy algorithm remains undefined when the weights have positive probability of being zero. If 
both weights were zero one could accept the move, reject the move or keep sampling new weights until 
one of them is not zero. Each of these lead to different behaviour. 

As seen in the examples of Section 3.4, the behaviour of the ratio of the weights (at least in the 
tails of the target) plays an important role in the ergodic properties of the noisy chain. In this context, 
it seems plausible to obtain geometric noisy chains, even when the marginal is not, if the ratio of the 
weights decays sufficiently fast to zero in the tails. Another interesting possibility, that may lead to future 
research, is to relax the condition on the expectation of the weights to be identically one. 
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A Proofs 


A.l On state-dependent random walks 

The following proposition for state-dependent Markov chains on the positive integers will be useful for 
addressing some proofs. See Norris (1999) for a proof of parts (i) and (ii), for part (Hi) see Callaert and 
Keilson (1973), which is proved within the birth-death process context. 

Proposition A.l. Suppose we have a random walk $ on N + with transition kernel P. Define for m > 1 

p m := P(m, {m + 1}) and q m := P(m, {to - 1}), 
with qi = 0,pi £ (0,1] and p m , q m > 0 ,p m + q m < 1 for all m > 2. The resulting chain is: 
i. recurrent if and only if 


£n 


Pi 


oo; 


ii. positive recurrent if and only if 


zn 


Pi-i 




< oo; 
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in. geometrically ergodic if 


lim p m < lim q m . 

m—too m—> oo 

Remark A.l. Notice that (3) is not an if and only if statement and that it implies (2). Additionally, if 
the chain is not state-dependent, (2) implies (3). 


A.2 Section 2 

Proof of Proposition 2.1. Since h is convex 

h{m) — h(m — 1) > h'(m — 1) and h(m) — h(m + 1) < —h'(m), 


implying 


Pjy(w, {to - 1}) 
P N (m,{m + 1 }) 


E 


> 


w 


(i) 


min ^ l,exp {h'(m - l)}^ 5 y 


E 


min ^ l,exp{-h'(m)}^j 


(i) 




C 1 ) 

Define Z := —jjj, and since tt( rtf) —>- 0 it is true that 

tC)\r 


log(fc) := lim h'(m) > 0, 


(27) 


hence 


P/v(m, {to— 1}) > E [min{l, fcZ}] 


P N (m, {m + 1}) E [min {1, k X Z}\ 


(28) 


If k = +oo, it is clear that the limit in (28) diverges, consequently the noisy chain is geometrically 
ergodic according to Proposition A.l. If k < oo, the noisy chain will be geometrically ergodic if 

E [min{l, kZ}] > E [min jl, fc _1 Z}] , 


which can be translated to 


fcE \Zt{z<k-i}\ +P [Z > fc 1 ] > k J E [Zt{z<k}\ + P [Z > k\, 

or equivalently to 

kP [k 1 < Z < k] + ( k 2 — l) E [ZtizKk- 1 }] > IE \_Z\{k~ 1 <z<k}\ ■ (29) 

Now consider two cases, first if P [/c -1 < Z < k\ >0 then it is clear that 

E [(* - Z) l{fc-i <z<k}\ > 0, 

which satisfies (29). Finally, if P [fc _1 < Z < k] =0 then 

P [Z^k- 1 ] =^=P [Z>k], 

implying from (27) 

(fc 2 -l)E[Zl {z < fc _ 1} ] >0, 

and leading to (29). □ 

Proof of Proposition 2.2. For simplicity the subscript N is dropped. In this case, 

Qm ,l = Q = £{{b- e)Ber(s) + e ), 
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and the condition Eg [W] = 1 implies 


1 — £ 
b — £' 


(30) 


Let 0 G 


(nM 


and set 


b = e 


29 

1-9 ’ 


(31) 


this implies a{m , w; m — 1, u) = 1 and 


i-e 

26 


U = W 

a(m, w; m + 1, u) = 1 if u = b 1 w = e . 

■i-e\ 2 


26 


= e.w = b 


Therefore, for m > 2, 

a(m, m — 1) = 1 and 


d(m, m + 1) = (s 2 + (1 - s) 2 ) + ^1 + j s C 1 ~ s ) ■ 


Consequently, P(m, {?u — 1}) = 1 — 0 and 

' 1-9 


l-i 


P(m, {m + 1 }) = 9 (s 2 + (1 - s) 2 ) + ^1 + ) J s ( X “ s ) ^ 

> 9s{ 1 — s). 

From Proposition A.l, if 

P(m, {m + 1}) > P(m, {m — 1}), 

then the noisy chain will be transient. For this to happen, it is enough to pick 8 and s such that 

9s(l - s) - (1 - 9) >0. 

Let s = e, then from (30) and (31) 

= (1 -e + e 2 ) 

1 — e + 3e 2 
2 £ 2 


(32) 


= 1 - 


1 — £ + 3e 2 : 


and if £ < 2 — \/3 then 


9s(l -s)-(l-9) = 
> 


1 — £ + 3£ 2 

£ 

1 — £ + 3£ 2 


1 — £ + 3£ 2 
> 0. 


((1 - £ + £ 2 )(1 - e ) - 2 e ) 

((l-£) 2 -2£) 

((2-£) 2 -3) 


Hence, for e £ (0, 2 — v^3) and setting s = £, 9 as in (32) and b as in (31), the resulting noisy chain 
is transient. □ 

Proof of Proposition 2.3. For simplicity the subscript N is dropped. In this case, 

Qm, 1 — Qm — £ (( b £ m )-E?6r(s m ) T £m) ; 
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and the condition Eg m [W m ] = 1 implies 


1 £m 

Sm — T • 

0 Sm 

Then, for m large enough 

a(m, iTi 1) — E [a(m, W m ; m - 1, W m - 1 )] 

= min jl, S m-lS m + s m _i(l - s m ) + (1 - s m _i)(l - S m )l{n 

+ O (to -1 ) and 

a(m, m — 1) = E [a( to, W m ; to - 1, W m _i)] 

= min ^ 1, — ^ (1 + (1 s m )(l Sm+ijlll 71 

+ O (to -1 ) . 


Define 


Cm • — 


P(m, {to — 1}) 
P(m, {to + 1 }) 

(1 — 9)a{m , m — 1) 
9a{m , to + 1) 


Since s m —► ^ as to —► oo, 

Co,oo •— ,lim C31; 

9\ (“inf 1 , i^} - X ) F + i +(!-!)' 


k—too 

l - 


< 


i-i 


0 ) W 

l-i 


e b -i 


=: Z, 


o, 


and 


Cl,oo • lim C3fcC3fc-(_i 
k—to o 


— C(),c 


< Zo 


(minjl, T^j-l) £ + g 

0 ) (minjl, -1)^ + 5 + (I-5)" 


1 - 6 > 


l-i 


i-i 


1-1 


9 ) ( 6 - 1)5 


—: /1 


lim C3fcC3fc + iC3fc + 2 — Cl 0 
k—>oo 


<h 


l-e\ (minj 1 ,^}- 1 ) ^ + 


9 ) (minjl, i^/}- 1)^ + 1 



1 

1 b 

V 0 J 

1 1 

'1 - 

P b) 


1 - 


(6-1) 3 


=:Z 2 . 


(mod 3)=0} 


(mod 3)^2} 
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Therefore, for any 5 > 0 there exists k 0 £ N + , such that whenever fc > fc 0 + 1 


fc-i 

K '■= C 3.7 C 3.7 + 1 C.3.7+2 

j=k 0 

< (l 2 + 5) k ~ k °, 

implying 

Kc 3k < (l 2 + S) k ~ k °(l 0 + 6), 

Kc 3k c 3k +i < (h + $) k k ° (h + 5) and 

Kc 3k c 3 k+ic 3 k+2 < (h + 5) k ~ k °{l 2 + 6). 

Hence, for i £ {0,1, 2} and some C > 0 

3 k-\-i 

n cj<c(i 2 +s) k . 

3 =2 

Let a m := YljL 2 C P then a sufficient condition for the series X)m =2 t° converge, implying a transient 
chain according to Proposition A.l, is l 2 < 1. This is the case for b > 3 + , since 


1 - h = 1 - 


i-e \ 3 b 2 


9 


(6-i r 


5 2 (b-lf (1-9 


(6- If 

b 2 

(6- if 

b 2 


b 2 


, „ 3 1 (\ — 9 

b ~ 3 +b-tf- 


e 


> {b-iy 

> 0. 


6-3- 


1 - d 


Hence, the resulting noisy chain is transient if b > 3 + (^g-^) 3 , for any 0 £ (0,1). 

A.3 Section 3 

Proof of Lemma 3.1. For any <5 > 0 

'W 2)JV 


□ 


w. 


<1-5 


x,N 


< 


W x> jv > 1 + 


< 1 - 




<51 



5 

< p 

H4,jv -1 

> - 
“ 2 

+ P 

W 2iJV - 1 

> - 
“ 2 




5 

< 2 sup P Qx JV 

w x , N - 1 

> - 
“ 2 

xex 



□ 


Proof of Lemma 3.2. Using the inequality 

min {1, ab} > min {1, a} min {1,5} , for a, b > 0, 


26 






























and applying Markov’s inequality with 6 > 0, 


Pn(x) = 1 - / q(x,dz)a N (x,z) 

J x 

< 1—1 q(x,dz)a(x, z )E 

Jx 

< 1 — (1 — S) f q(x , dz)a(x, z )P 
J x 


min < 1, 


W : 


z,N 


W X M 


■ , 1 W Z ,N I x 

nun < 1, —- > > 1 — o 


’W a 


x,N 


= 1 — (1 — 5) I q(x, dz)a(x, z) + (1 — S) / q(x,dz)a(x, z )P 
Jx J x 

< 1 — (1 — 6) (1 — p(x)) + / q(x , dz)a(x, z) P 

Jx 

Finally, using Lemma 3.1 


Wz 

W x 


< 1-6 


W : 


W, 


< 1-6 


x,N 


Pn{x) < p(x) + 6 (1 - p(x)) + 2 sup P 

x&X 


\W x , N -l\ > 


(l- p{x)) 


< p(x) + 5 + 2 sup P 
xex 


\w x<N - 1 | > 


□ 


Proof of Lemma 3.3. For the first claim apply Jensen’s inequality and the fact that 

min {1, ab } < min {1, a} 6, for a > 0 and b > 1, 


hence 


- , v . i, 7T (z)q{z,x) 

a.N\x. z) < mm < 1, --lit 

7T (x)q(x,z) 


W z 


,N 


w x 


N 


< a(x, z)E 


w-j, 


E[W Z>N }. 


□ 


Proof of Lemma 3-4- Using the inequality 

min {1, ab} < min {1, a} 6, for a > 0 and b > 1, 


a N (x, z)=E a N (x,W XiN -,z,W z , N )tcw z N 

1 w XiN 


+ E 


aw{x, W Xj n\ z, W 2> jv) 1/ w. 


< a(x, z) (1 + t]) P 

< a(x, z) + ij + P 


W z 


,N 


W X:N 

W Z ,N 


w x 


N 


<l+r] 
> 1 +rj 


+ P 


W z 


N 


w r , 


x,N 


>l+r] 


Notice that 


W z 


,N 


w. 


x,N 


>1 + 77 


W x< N < 1 


w s 


z,N 


1+r] 


then applying Lemma 3.1 taking 6 = Y+lj- 


a n(x, z) < a(x, z) + r) + 2 sup P 

xGX 


W X , N - 1 


> 




2(1 + 7?)J ' 


□ 
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Proof of Proposition 3.1 . Taking V = n s , where 0 < s < min {1, £ }, 


qV(x) f V ( z ) 


V(x) J x V{x) 

f ( 

Jx Wt) 


q{x, dz) 
q(x, dz) 


< / exp {a\\z — a;||} q(\\z — x\\)dz. 

J R d 


Finally, using the transformation u = z — x, 

qV(x) 


V{x) 


< / exp {a||it||} <?(||«||)dit, 


which implies (PI*). 

Proof of Proposition 3.2. By properties of the arithmetic and harmonic means 

N 


N 


-Y — - 
N ;wY Ef=i wY 


>o, 


which implies, by Jensen’s inequality. 

E 


1 N 1 

-Y 1 

AT 


N 


Then, using Fatou’s lemma and the law of large numbers 


< E [W- 1 ] - 1. 


E [W x 1 ] — 1 > limsupE 

N—too 


> liminfE 
N—too 


'i N i 

N ^ w (k) ~ Wx,N 

fc=1 VVx 
' 1 N 1 

1V —_ w~l 

N ^ W ( k ) X ’ N 

k =1 VVx \ 


> E 


1 1 


lim inf ( — 

A T —\ ]\J » ^ 


\N wj fe) _ 
> E [VP" 1 ] - 1, 


— lim sup IT 
N—>oo 


-1 

x,N 


hence 


lim E 


1 N 1 

1 

N ^ 


N —>oo N ^ w {k) X ' N 
L fc= i VVx J 


= E [JT- 1 ] - 1. 


Finally, since 


E 


T^I 

1V —_ w~l 

N ^ W ( k ) X ' N 

k =1 VVx 


= E [IT" 1 ] - E 


w. 


-1 

x,N 


the expression in (33) becomes 


lim E 

iV—>■ oo 


w~ 


,N 


= 1. 


□ 


(33) 


□ 

Proof of Lemma 3.5. The proof of (z) is motivated by Piegorsch and Casella (1985, Theorem 2.1) and 
Khuri and Casella (2002, Theorem 3), however the existence of a density function is not assumed here. 


28 































Since Z p is positive, 


E [Z~ p ] = [ P [Z~ p > z] dz 

J R+ 



For part (ii), since the random variables {Zi} are positive, then for any z > 0 


N 




i=l 


N 


> Zd < z, max \Zi\ < z 


Therefore, for 2 : G (0, 7 ) 



■ N 


r -| 

p 

Z Zi - z 

_i =1 

< p 

max \Zi \ < z 
_ie{l . N} 


N 

= Y[P[Z l <z\ 

i =1 
N 

< Y[MiZ^ ai . 

i=i 


Part (iii) can be seen as a consequence of W x< jv and W Xi jv+i being convex ordered and < 7 ( 2 ;) = x~ p 
being a convex function for x > 0 and p > 0, (see, e.g., Andrieu and Vihola, 2014). We provide a 
self-contained proof by defining for j £ { 1 ,..., N + 1} 


S 


U) 

x,N ■ — 


1 

N 


N+l 

E ^ 

k=l,k^j 


and we have 


W 


N+l 

E*S 


x , N + 1 - N +l “x,N 

3 =1 

and since the arithmetic mean is greater than or equal to the geometric mean 

N+l 

n s *,n 


This implies for p > 0 


W Xi n +1 > 


n'=i 


E 


^+i 


< E 


r N+l 


riE 


U) 

N 


J =1 


N+l 


< 


3=1 

= E 


n (e [(e) 

(e) 


= E 


w~ p N 
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where Holder’s inequality has been used and the fact that the random variables ■ j £ 1,. .., N + 1 j 

are identically distributed according to Q x ,n- 

For part ( iv ), let M 1 = sup ye r 7 00 ) \g(y)\ and due to continuity at y = 1, for any e > 0 there exists a 
S > 0 such that 

E [| q{Wx,n) ~ s e[ 7 ,oo)] < 2 M 7 P [7 < W Xi n < 1 — 5] + 2 M 7 P [1 + S < W Xj n] 

+ E [|g (W Xi jv) - fl'(l) |<5,1+5)] 

< 2M 7 P [\W x , n - l| > 5] +eP [\W X}N - l| < S] . 

Therefore, for fixed e and by (Wl) 

lim sup E [I g (W X , N ) - g{\)\t w we r 7 «,)] < 2M 7 lim sup P [I W x N - l| > S] + e 

N ^°°x£X ' N ~>°°x£X 

< £, 

obtaining the result since e can be picked arbitrarily small. □ 

Proof of Proposition 3.f. First notice that if l < oo then l > 1. To see this, define 

__ £m— 1 

•— 5 

£-m 

then for fixed 5 > 0, there exists Me N such that for m > M 

CL-rn ^ l 3. 


Then, for m > M 


m 



, | <r\m—M e l 

<{l + d) -, 

£m 


and because e m —> 0, it is clear that (l + S) m —> oo as m —> oo. Therefore, l + 6 > 1 and since S can be 
taken arbitrarily small, it is true that l > 1 . 

Now, for weights as in (21) and using a simple random walk proposal, the noisy acceptance probability 
can be expressed as 


and 


5jv(m, m — 1 ) 


N N 

EE min 

j =0 k—0 


i, 


20 0 m —lj + {N — j ) £ m -l 1 
1-0 b m k + (N - k) Em J 



(s m ) K (1 - &m— 1 ) N ° (1 ~ «m) 


\N — k 


aN(m, m + 1 ) 


• /1 1 — 0 b m+ i j + (IV — j) £ m +i 1 

Kk + (N- k )s m } 

X i(() (s m +l) J (>„)* (1 - S m+ l) N ~ J (1 - S m ) 



N-k 


(34) 


(35) 


Since b m —> oo, then s m —> 0 as m —>• oo; therefore, any term in (34) and (35), for which j + k ^ 0, 
tends to zero as m —> oo. Hence, 


a N (m, to — 1) = min (1, x (i - s m _!) N (1 - s m ) N + o(l) 

l 1-0 £ m J 

and 

d N (m,m + 1) = min (1, f—- x (1 - (1 - s m ) N + o(l), 

l 20 £rn J 
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implying, 


lim 

m—> oo 


Pn (m , {to - 1}) 
P N (m, {to + 1 }) 


(1-0) linim-j.oo min <j 

f-| 20 Em- 

[■*■» 1-0 X 

r} 

0 linim^oo min j 


1-0 e m +l 1 
20 Em J 



If l = +oo, (36) tends to +oo, whereas if l < oo 


lim 

m—too 


P N (to, {to - 1 }) 
Pn (to, {to + 1 }) 


= min{l - 0 , 281} 
min {2 61 ,1 — 0 } 

> 2 . 


In any case, this implies 

lim Pn (to, {to — 1 }) > 2 lim P/v (to, {to + 1 }), 

m—>oo m—> oo 

and since 

lim Pn (to, {to — 1 }) = min {1 — 0 , 261} 

m—> oo 

>o, 


the noisy chain is geometrically ergodic according to Proposition Proposition A.l. 
Proof of Proposition 3.5. Noting that 


£m-l £ I ° i™ 2 ) 

£m { O (TO -1 ) 

and f=±i 6 { 0 {m ~ 2 ) 
£m I O ( rn ) 


if 


if 


expressions in (34) and (35) become 


<5at (to, to — 1 ) = (1 - s m _i) W (1 - s m ) JV l{ m (mod 3 ) =0 } + 0 (to : ) 


to (mod 3) = 0, 
to (mod 3) € {1, 2}, 

to (mod 3) = 2, 
to (mod 3) £ {0,1}, 

AT . 


and 


a N (m, m + 1) = (1 - Sm+i)" (1 - s m ) N l{ m ( mod 3 )— 0 , 1 } + O (to *) . 

Therefore, 

PjV (m, {m - 1}) _ /1 — (1 - s ra .,) N +0(«i _1 ) t 

Pn (ja,{ai + 1}) V « 7 (1 - » m+1 )" + 0(,n->) l “ d 3| -”' 

+ O (to ) (mod 3) = 1} 

+ 0(l)l {m (mod 3)=2>! 

implying there exists C £ R + such that for j = 0,2 

lim Pn (3fc + j, {3fc + j — 1}) < ^ 

fc^oo p^ ( 3 fc + j, { 3 fc + j + 1 }) “ 

and 

nm ~ Pn (3fc + 1, {3fc}) = Q 

fc->oo p^ ( 3 ^ ^ { 3 fc 2 }) 

Then, for fixed 5 > 0 there exists fcp £ N + such that whenever k > Atq 


Pjv (3fc + j, {3k + j — 1}) 
Pn (3 k + j, {3k + j + 1 }) 


for j = 0 , 2 


(36) 


□ 
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and 


Let 


then for fc > fco + 1 


P N (3k + 1, {3k}) 
P N (3k+l,{3k + 2}) < ' 

-Pjy(m, {to - 1}) 

m P N (m,{m + 1})’ 


3fc+l fc 

n c 'i = 

f=2 i=i 

. v k—ko 

< ((C + <5) 2 c 3.7-1 c 3j c 3j+1 ■ 

3= 1 


Take (5 small enough, such that (C + d) 2 c)<l, hence 

oo 3fc+l fco 3fc+l oo 3fc+l 

e n = e n ^+ e n ^ 

k =1 i=2 fc=l j=2 fc=fc 0 j=2 

fc 0 3fc+l fco oo fc _ fc 

< E II C i + II C 3.7-1 C 3.7 C 3j+1 E (( C + 

fc=1 j—2 i=l fc=fco 

fco 3fc+l T-rfeo .. .. 

T T llj=l c 3j-1 c 3jC3j+i 

= £H Cj+ i -( C+fl * 4 

< oo. 

Similarly, it can be proved that 


oo 3fc+2 

E II < 00 

k—0 j =2 


and 


oo 3fc 

e n c j < °°> 


fc=lj =2 


thus 


ELb< °°> 

m—2 j =2 

implying the noisy chain is transient according to Proposition A.l. 

A.4 Section 4 

Proof of Proposition ^.1. From (23) and taking <5 < |, r) = 


sup \\Pn(x, •) - P(x,-)\\tv 

x&X 


< 3d + 4 sup P 
xex 


' 

5 

W X , N — 1 

> 

- 2 


Using Markov’s inequality 







1 + fc 

/ 

sup \\Pn(x, 

■) - P(x, •) TV < 3d + 

4 sup P 

w x 

,N ~ 1 


> 

xex 


xex 




V 


< 3d + 

o3+fc 




1+fc' 


sup E 
d i+A - xex 


- 1 





2 3 + k 

E 36 + 


W r -l 


1 +fc 
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Now, let 




1 +fc" 

Ck = sup E 

w x -1 


x€lX 




then the convex function / : R + —► R + where 


is minimised at 


Then, 


f(s) = 3s + 


2 3+k C k 

gl+fc ]\Jk ’ 


/ (1 + k)2 3+k Ck \ 2 + fe 
" ^ 3 N* ) 

= O (at4»A . 


sup \\Pn(x, •) - P(x,-)\\tv < f(s*) 

xGX 


= O 

= o 


(^N 2 + k 

(n~ 5T* 


) + 0 ( A r-^ + ^) 

)■ 


Applying Theorem 4.2 by taking 

r(N) oc N 1 ^ 2 +¥ 

and noting log < log(iV), the result is obtained. 

For the second claim, for a given e € (0,1) take k e > 2 (e _1 — l) and apply the first part. 


□ 
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